Bioinformatics of the Brain (Kayhan Erciyes, Tuba Sevimoğlu)

204

Bioinformatics of the Brain

After the identification of differentially expressed genes we might need to

understand if or which groups of genetic data have similar expression pat-

terns. Here we can use tools that offer clustering methods such as k-means or

hierarchical clustering. In k-means clustering each individual in the group is

placed in the cluster where it has a mean value that is closest to the cluster’s

mean value, and there are k number of clusters [28]. It is the most widely

used algorithm in data mining. In hierarchical clustering nodes are compared

with one another based on their similarity [29]. Biological networks such as

gene interaction networks enable us to comprehend collective patterns that

would not be possible when examining them individually. Although there

are many tools available for visualizing biological networks, Cytoscape still

emerges as the most popular one. Cytoscape facilitates network analysis as

well as the visualization of network interactions such as gene, protein, and

miRNA [30]. Data from other sources, such KEGG, is also incorporated by

this tool. Enrichment analysis tools can be used to enrich the data, or in other

words understand which phenotype a group of genetic data is associated with.

These tools make use of databases such as The Cancer Genome Atlas (TCGA)

[31], Kyoto Encyclopedia of Genes and Genomes (KEGG) [32], Gene Ontol-

ogy, and PANTHER [33]. Here the above-mentioned databases and others are

used to identify common biological functions, signaling pathways and inter-

actions networks and more. A widely used enrichment analysis tool is Gene

Set Enrichment Analysis: (GSEA) [34]. Another enrichment analysis tool in

use is The Database for Annotation, Visualization and Integrated Discovery

(DAVID) (https://david.ncifcrf.gov/home.jsp) [35].

There are also databases for storing the discoveries made through tech-

niques mentioned here and/or others.

The Genetic Association Database

(GAD) is a repository of information from genetic association studies that

have been published, in which the data and metadata presented in each study

have been structured into a common format [36]. An extensive collection of hu-

man genes and genetic features can be found in the OMIM (Online Mendelian

Inheritance in Man) database [37]. DisGeNET is a database that compiles

data on human gene-disease and variant-disease relationships from numerous

sources, including GAD and OMIM [38]. Figure 8.2 also displays the number

of genes retrieved from DisGeNET that are connected to the diseases and

disorders under study.

8.5

Bioinformatics Studies on Brain Diseases and

Disorders

There are numerous experiments accomplished since the emerge of microar-

ray and RNA-seq technologies. Accordingly, only recent studies involving